Add multi-period DiD support by igerber · Pull Request #6 · igerber/diff-diff

igerber · 2026-01-02T15:40:24Z

Extend the library to optionally work across multiple time periods:

Add MultiPeriodDiD estimator that handles multiple pre/post periods
Add MultiPeriodDiDResults for period-specific and average ATT
Add PeriodEffect dataclass for individual period treatment effects
Support for period dummies and treatment × period interactions
Compute average ATT with proper covariance-adjusted standard errors
Allow custom reference period selection
Auto-infer post-periods when not specified (last half of periods)
Full support for covariates, fixed effects, and absorbed FE
Cluster-robust and heteroskedasticity-robust standard errors
Comprehensive test suite with 28 new tests (70 total passing)

Extend the library to optionally work across multiple time periods: - Add MultiPeriodDiD estimator that handles multiple pre/post periods - Add MultiPeriodDiDResults for period-specific and average ATT - Add PeriodEffect dataclass for individual period treatment effects - Support for period dummies and treatment × period interactions - Compute average ATT with proper covariance-adjusted standard errors - Allow custom reference period selection - Auto-infer post-periods when not specified (last half of periods) - Full support for covariates, fixed effects, and absorbed FE - Cluster-robust and heteroskedasticity-robust standard errors - Comprehensive test suite with 28 new tests (70 total passing)

Revised review reflects: - #1, #4 verified as non-issues (correct by design) - #3, #5, #6, #8, #13 addressed in commit e40d6b4 - Updated recommendation to approve and merge - Remaining items are low-priority style suggestions for future PRs

Addresses axis B findings #6 and #7 from the silent-failures audit: trop_global.py:448 outer alternating-min loop, trop_global.py:466 hard-coded range(20) inner FISTA loop, and trop_local.py:680 alternating-minimization loop all exited silently on max_iter exhaustion, returning the current iterate as if converged. - trop_global._solve_global_with_lowrank: thread a converged flag through the outer loop; count non-convergence events from the inner FISTA and surface the count in the outer warning for diagnostic context. One warn_if_not_converged call per solver invocation. - trop_local._estimate_model: thread a converged flag through the outer alternating-min loop; call warn_if_not_converged on exhaustion. - REGISTRY updated under TROP. New TestTROPConvergenceWarnings class (4 tests) exercises both global and local paths with forced non-convergence (max_iter=1, tol=1e-15) and a convergent negative control. Notable: the default TROP local config (max_iter=100, tol=1e-6) does not converge within max_iter on typical synthetic panels, so this PR surfaces a previously silent non-convergence that affected routine user fits. No numerical change in the returned iterate; the warning is additive. Axis-B regression-lint baseline: 5 -> 2 silent range(max_iter) loops remaining (minor loops in honest_did/power not yet addressed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P0: bias_corrected_local_linear now routes the CI through `safe_inference()` so degenerate cases with `se_robust <= 0` or non-finite `se_robust` (e.g., exact-fit / constant-y) return `(NaN, NaN)` rather than a misleading zero-width or infinite CI. Matches the repo-wide inference contract (CLAUDE.md Key Design Pattern #6). P1: Auto-bandwidth path now calls `lpbwselect_mse_dpi` directly with `cluster`, `vce`, and `nnmatch` forwarded. Previously it went through `mse_optimal_bandwidth` which hard-codes unclustered / nn / nnmatch=3, silently mismatching the downstream `lprobust` fit's reported estimator. Tests added: TestNaNSafeCI (constant-y + near-zero-SE) and TestAutoBandwidthForwardsParameters (cluster+auto, vce='hc1'+auto, nnmatch=5+auto), all asserting the selected bandwidth changes when the corresponding parameter changes (catches silent fallback). Also: suppress spurious BLAS FPE warnings in lprobust_bw's hc/hc2/hc3 branch (numpy issue #21432 pattern), newly reachable via the wired-through vce='hc1' auto-bandwidth path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Close BR/DR gap #6: target-parameter block in schemas

Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378 shipped #5 by_path + controls). Removes the two NotImplementedError gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds: - New `path_cumulated_event_study` Results field surfacing the cumulated level effect delta_l per path under trends_linear=True (mirrors the global linear_trends_effects cumulation; this is what R returns under did_multiplegt_dyn(..., by_path, trends_lin)). - `set_ids` parameter threaded through the four per-path IF helpers so trends_nonparam's set-restricted control pool reaches per-path analytical SE, bootstrap, placebos, and sup-t bands automatically. - to_dataframe(level="by_path") gains cumulated_effect / cumulated_se columns (always present, NaN-when-None — mirrors cband_*). - summary() renders a per-path "Cumulated Level Effects" sub-section. Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios: - single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4, cohort-single-path, n_periods=13) — per-path cumulated point estimates match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20% - multi_path_reversible_by_path_trends_nonparam — per-path point estimates AND placebos match R bit-exactly, SE within ±15% Placebo parity for trends_linear is intentionally skipped: R's per-path placebo re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture, so the divergence is methodological, not a tolerance issue. Internal regression covers placebo + trends_linear (finite values, bootstrap inheritance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7)

Closes BR/DR foundation gap igerber#6 from project_br_dr_foundation.md: BusinessReport and DiagnosticReport now name what the headline scalar actually represents as an estimand, for each of the 16 result classes. Baker et al. (2025) Step 2 ("define the target parameter") was previously in BR's next_steps list but not done by BR itself — this PR closes that gap. New top-level ``target_parameter`` block (additive schema change; experimental per REPORTING.md stability policy): { "name": str, # stakeholder-facing name "definition": str, # plain-English description "aggregation": str, # machine-readable dispatch tag "headline_attribute": str, # which raw result attribute "reference": str, # REGISTRY.md citation pointer } Schema placement: top-level block (user preference, selected via AskUserQuestion in planning). Aggregation tags include "simple", "event_study", "group", "2x2", "twfe", "iw", "stacked", "ddd", "staggered_ddd", "synthetic", "factor_model", "M", "l", "l_x", "l_fd", "l_x_fd", "dose_overall", "pt_all_combined", "pt_post_single_baseline", "unknown". Per-estimator dispatch lives in the new ``diff_diff/_reporting_helpers.py::describe_target_parameter`` (own module rather than business_report / diagnostic_report to avoid circular-import risk — plan-review LOW igerber#7). All 17 result classes covered (16 from _APPLICABILITY + BaconDecompositionResults); exhaustiveness locked in by TestTargetParameterCoversEveryResultClass. Fit-time config reads: - ``EfficientDiDResults.pt_assumption`` branches the aggregation tag between pt_all_combined and pt_post_single_baseline. - ``StackedDiDResults.clean_control`` varies the definition clause (never_treated / strict / not_yet_treated). - ``ChaisemartinDHaultfoeuilleResults.L_max`` + ``covariate_residuals`` + ``linear_trends_effects`` branches the dCDH estimand between DID_M / DID_l / DID^X_l / DID^{fd}_l / DID^{X,fd}_l. Fixed-tag branches (per plan-review CRITICAL igerber#1 and igerber#2): - ``CallawaySantAnna`` / ``ImputationDiD`` / ``TwoStageDiD`` / ``WooldridgeDiD``: the fit-time ``aggregate`` kwarg does not change the ``overall_att`` scalar — it only populates additional horizon / group tables on the result object. Disambiguating those tables in prose is tracked under gap igerber#9. - ``ContinuousDiDResults``: the PT-vs-SPT regime is a user-level assumption, not a library setting. Emits a single "dose_overall" tag with disjunctive definition naming both regime readings (ATT^loc under PT, ATT^glob under SPT). Prose rendering: - BR ``_render_summary``: emits "Target parameter: <name>." after the headline sentence (short name only; full definition lives in the full_report and schema). - BR ``_render_full_report``: "## Target Parameter" section between "## Headline" and "## Identifying Assumption". - DR ``_render_overall_interpretation``: mirror sentence. - DR ``_render_dr_full_report``: "## Target Parameter" section with name, definition, aggregation tag, headline attribute, and reference. Cross-surface parity: both BR and DR consume the same helper (the single source of truth), so their ``target_parameter`` blocks are byte-identical (verified by TestTargetParameterCrossSurfaceParity). Tests: 37 new (TestTargetParameterPerEstimator + TestTargetParameterFitConfigReads + TestTargetParameterCoversEveryResultClass + TestTargetParameterCrossSurfaceParity + TestTargetParameterProseRendering). Existing BR/DR top-level-key contract tests updated to include ``target_parameter``. Total 319 tests pass (282 prior + 37 new). Docs: REPORTING.md gains a "Target parameter" section documenting the per-estimator dispatch and schema shape. business_report.rst and diagnostic_report.rst note the new field with a pointer to REPORTING.md. CHANGELOG entry under Unreleased. Out of scope: REGISTRY.md per-estimator "Target parameter" sub-sections (plan-review additional-note); the reporting-layer doc in REPORTING.md is the current source of truth. A follow-up docs PR can land those sub-sections if maintainers want the registry to own the canonical wording directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) The CI AI reviewer's quality has notably degraded since igerber#404 and igerber#415 landed. This restores 5 files to the snapshot at fe80295 (parent of igerber#404's merge, i.e. the last commit on main before either PR landed): .github/workflows/ai_pr_review.yml -- reinstates openai/codex-action@v1 with model: gpt-5.4 and effort: xhigh .github/codex/prompts/pr_review.md -- removes Single-Pass Completeness Mandate (igerber#404) and Audit igerber#6 "Claim-vs-shipped" + tightened verdict bar (igerber#415); restores the original 179-line prompt .claude/scripts/openai_review.py -- drops --ci-mode (igerber#415) and gpt-5.5 PRICING/reasoning entries (igerber#404); DEFAULT_MODEL back to gpt-5.4 .claude/commands/ai-review-local.md -- restores skill doc to match the restored script tests/test_openai_review.py -- restores test suite to match (152 tests pass at the snapshot) igerber#404's body documented the rollback criterion ("if >2 of next 5 PRs surface new P1+ findings on unchanged code in round 2, revert the model bump"). This invokes that rollback and extends it to also revert the Codex -> Python single-shot backend switch from igerber#415. Out of scope: open branch ci-workflow-ipynb-markdown-extraction (PR igerber#414, unmerged) also touches openai_review.py and pr_review.md; that branch will need to rebase against the older base after this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ueue + example refs One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still described ContinuousDiD as "In Progress" in two surrounding surfaces even after the status-table flip, creating conflicting status signals. Fixed both sites: 1. L27 explanatory paragraph: removed the ContinuousDiD example from the In Progress band's "has methodology file but no paper review" illustration (it's now Complete). 2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD) and renumbered the remaining queue. Retroactive fix per feedback_changelog_accuracy_fixes (CI review catching one factual error in the queue means scanning for the same mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but left entry #6 (HAD) in the same In Progress queue. Removed HAD's entry too and renumbered, so the queue is now self-consistent with the status table for all Complete entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closes the WooldridgeDiD (ETWFE) methodology-review-tracker promotion in METHODOLOGY_REVIEW.md (In Progress → Complete), following the primary-source review for Wooldridge (2025) merged in PR-A (#484). Adds two paper-driven implementation surfaces and extends R-parity goldens to the nonlinear paths. Implementation: - `aggregate(weights="cohort_share")` on WooldridgeDiDResults implements paper Eqs. 7.4 (simple-overall) and 7.6 (event-time, restricted to k>=0) cohort-share aggregation weights as an opt-in alternative to the default cell-count weighting (matching Stata `jwdid_estat`). Inference fields fail-closed to NaN with UserWarning per paper Section 7.5 conditional-on-shares semantics; raises on `survey_design` (design-consistent totals deferred); raises on `type ∈ {"group","calendar"}` (no paper closed-form); raises on bootstrap fits (no matching bootstrap variant). Closes TODO row 95. - `cohort_trends=True` on `WooldridgeDiD.__init__` adds linear `dg_i · t` cohort-specific trend interactions (paper Section 8 / Eq. 8.1) for the OLS path. Rejects on logit/poisson per paper Section 8 OLS scope; rejects on survey_design pending full-dummy/TSL validation; enforces per-cohort pre-period identification check (≥ 2 observed pre-periods per treated cohort). Auto-routes to full-dummy mode regardless of vcov_type. Closes the PR-A Requirements Checklist heterogeneous-trends gap. Tests: - `tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes (Theorem 3.1, Proposition 5.1, Section 6 event study, Section 7 aggregation paths, Section 8 heterogeneous trends, Section 10 unbalanced panels) + `TestW2025LibraryDeviations` consolidating 5 surviving deviations. Mirrors the HAD PR #473 precedent. - Two new R-parity surface classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the structural surface against R `etwfe(family=...)` log-link goldens. - 209 tests total (60 methodology + 149 R-parity + unit regressions). R Goldens: - `benchmarks/R/generate_wooldridge_golden.R` extended with Poisson + logit DGPs via R `etwfe`; augmented panel CSV retains the same seed-generated `y_pois` + `y_logit` columns for cross-language reproducibility. - `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`. Tracker promotion: - METHODOLOGY_REVIEW.md L52 status flip with merge date; detail section L583-605 rewritten to the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns template mirroring HAD / ContinuousDiD / DCDH. L27 example re-pointed; priority queue items #7-#10 renumbered to #6-#9. - REGISTRY.md `## WooldridgeDiD (ETWFE)` extended with `### Deviations from the paper / from R / library extensions` block consolidating 7 surviving deviations + opt-in notes for cohort_share + cohort_trends + survey rejection + bootstrap cohort_share rejection contracts. - CHANGELOG.md `[Unreleased]` `### Added` documents the new parameters, R-parity extension, and tracker flip. - `docs/methodology/papers/wooldridge-2025-review.md` Requirements Checklist + Gaps & Uncertainties items 1 + 11 marked `**Status:** Closed in PR-B`. - `docs/api/wooldridge_etwfe.rst` updated with weighting-scheme notes alongside the existing aggregation table. Second of two PRs for the WooldridgeDiD methodology-review-tracker promotion. PR-A merged at e416aed (#484). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ment criteria The CI AI reviewer (openai/codex-action@v1) was missing P2 findings the local single-shot Responses API path consistently catches. Smoking gun on PR igerber#412 SHA 7f7e3d5: CI Codex returned a clean verdict with zero P2 findings; the single-shot path on the same SHA flagged 3 real P2 gaps (docstring drift, missing survey-composition tests, REGISTRY cross-doc inconsistency), all real, two of which the user shipped a fix for in the next commit. Three coordinated changes: - Replace the openai/codex-action@v1 step in ai_pr_review.yml with a Python step that invokes .claude/scripts/openai_review.py in a new --ci-mode. Reuses the proven single-shot architecture instead of a parallel pipeline that drops findings. Workflow renamed from "AI PR Review (Codex)" to "AI PR Review". Existing  comment markers preserved for backward compat with historical PR canonical comments. - Add a directive Audit igerber#6 ("Claim-vs-shipped audit") to the Single-Pass Completeness Mandate in pr_review.md and its local-mode substitution in openai_review.py. Actively cross-references REGISTRY / CHANGELOG / PR-body claims against implementation, tests, public docstrings, rendering surfaces, and cross-doc consistency, flagging absences per the deferral rule. Required because the original prompt let reviewers enumerate what exists without auditing what was claimed but missing. - Tighten assessment criteria so unmitigated P2 findings produce a "Needs changes" verdict (not "Looks good"), with explicit "must enumerate" wording preventing silent skips and a targeted carve-out preventing TODO.md from laundering shipped-behavior test gaps to P3. Mitigation via REGISTRY.md Notes or pre-existing TODO entries remains available for non-shipped-claim P2s. The script's _SUBSTITUTIONS list is split into _LOCAL_FRAMING_SUBSTITUTIONS (9 PR -> code-change tuples, applied only in local mode) and _MANDATE_SUBSTITUTIONS (1 tuple, applied to all single-shot uses since neither has tool access). New CLI flags --ci-mode, --pr-title, --pr-body plumb PR context into a "## PR Context" prompt section in CI mode. PR body wrapped in <pr-body untrusted="true">...</pr-body> with a case/whitespace-tolerant regex stripping any literal closing tags. Pre-merge verification on PR igerber#412 SHA 7f7e3d5 with the modified script (--ci-mode --full-registry --context standard --model gpt-5.5): verdict "Needs changes"; surfaces all 3 of the smoking-gun P2s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…GELOG H1 scope - R6 fix left METHODOLOGY_REVIEW.md Deviations item igerber#6 stale (only updated the Verified-Components row). Item igerber#6 still said "Eq. 18 linear-trend- detrended joint Stute deferred". Rewritten to match the rest of the HAD tracker: trends_lin=True is SHIPPED + R-parity-locked in test_did_had_parity.py; the methodology-walkthrough file deliberately doesn't duplicate that coverage; the Pierce-Schott published-value numerical replication is what's waived (Deviations Note igerber#3). - R6 narrowed the Verified-Components row + class docstring but missed the CHANGELOG bullet, which still claimed "joint Stute pre-trends + homogeneity H0 fail-to-reject + H1 reject under nonlinear DGP". Narrowed to: "H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP" — matches the test file's actual scope. All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to Complete. Adds the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns detail section mirroring the ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7 DCDH deviations from the paper, from R DIDmultiplegtDYN, and library extensions into a labeled REGISTRY surface per the AI-review "Documenting Deviations" convention. CHANGELOG [Unreleased] gains a new Added entry. L27 In Progress example re-pointed to WooldridgeDiD; L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11 renumbered to igerber#6-igerber#10. No source code changes, no new tests, no new docstrings — documentation consolidation only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

igerber merged commit 88c6542 into main Jan 2, 2026

igerber deleted the claude/add-multi-period-support-OhFL8 branch January 2, 2026 15:40

igerber mentioned this pull request Apr 20, 2026

Close BR/DR gap #6: target-parameter block in schemas #347

Merged

igerber added a commit that referenced this pull request Apr 22, 2026

Merge pull request #347 from igerber/br-dr-target-parameter

8999baf

Close BR/DR gap #6: target-parameter block in schemas

igerber mentioned this pull request Apr 26, 2026

dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7) #393

Merged

6 tasks

igerber added a commit that referenced this pull request Apr 26, 2026

Merge pull request #393 from igerber/dcdh-by-path-trends

c6cb864

dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7)

This was referenced May 10, 2026

Replace CI Codex agent with single-shot Responses API + tighten verdict bar #415

Merged

Revert AI review CI to Codex + gpt-5.4 (reverts #404, #415) #416

Merged

igerber mentioned this pull request May 21, 2026

ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion: In Progress -> Complete #481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-period DiD support#6

Add multi-period DiD support#6
igerber merged 1 commit into
mainfrom
claude/add-multi-period-support-OhFL8

igerber commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

igerber commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants